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Abstract 

We give a new treatment of tabular LR 
parsing, which is an alternative to Tomita's 
generalized LR algorithm. The advantage 
is twofold. Firstly, our treatment is con- 
ceptually more attractive because it uses 
sinıpler concepts, such as grammar trans- 
formations and standard tabulation tech- 
niques also know as chart parsing. Sec- 
ondly, the static and dynamic complexity 
of parsing, both in space and time, is sig- 
nificantly reduced. 

1 Introduction 



The efficiency of LR(fc) parsing techniques (Sippu 
and Soisalon-Soininen, 199C) is very attractive from 
the perspective of natural language processing ap- 
plications. This has stimulated the computational 
linguistics community to develop extensions of these 
tcchniques to general context-free grammar parsing. 
The best-known example is generalized LR pars- 
ing, also known as Tomita's algorithm, described by 



Tomita (1986) and further investigated by, for ex- 
ample, |Tomita (1991|) and |Nederhof (1994a[). Des- 



pite appearances, the graph-structured stacks used 
to describe Tomita's algorithm differ very little from 
parse tables, or in other words, generalized LR pars- 
ing is one of the so called tabular parsing al gorithms, 



amon g which also the CYK algorithm (Harrison 
1978|) and Earley's algorithm flEarley, 197(1 can be 



found. (Tabular parsing is also known as chart pars- 
ing.) 

In this paper we investigate the extension of LR 
parsing to general context-free grammars from a 
more general viewpoint: tabular algorithms can of- 
ten be described by the composition of two construc- 
tions. One example is given by Lang (1974 ) and 
Billot and Lang (1989 ): the construction of push- 



of these automata by means of tabulation yield dif- 
ferent tabular algorithms for different such construc- 
tions. Another example, on which our presentation 



down automata from grammars and the simulation 



is based, was first suggested by Leermakers (1989): 
a grammar is first transformed and then a standard 
tabular algorithm along with some filtering condi- 
tion is applied using the transformed grammar. In 
our case, the transformation and the subsequent ap- 
plication of the tabular algorithm rcsult in a new 
form of tabular LR parsing. 

Our method is more efficient than Tomita's algo- 
rithm in two respects. First, reduce operations are 
implemented in an efficient way, by splitting them 
into several, more primitive, operations (a similar 
idea has been proposed by Kipps (1991) for Tomita's 
algorithm). Second, several paths in the computa- 
tion that must be simulated separately by Tomita's 
algorithm are collapsed into a single computation 
path, using state minimization techniques. Exper- 
iments on practical grammars have indicated that 
there is a significant gain in efficiency, with regard 
to both space and time requirements. 

Our grammar transformation produces a so called 
cover for the input grammar, which togcthcr with 
the filtering condition fully captures the specifica- 
tion of the method, abstracting away from algorith- 
mic details such as data structures and control flow. 
Since this cover can be easily precomputed, imple- 
menting our LR parser simply amounts to running 
the standard tabular algorithm. This is very attrac- 
tive from an application-oriented perspective, since 
many actual systems for natural language processing 
are based on these kinds of parsing algorithm. 

The remaindcr of this paper is organized as fol- 
lows. In Section ^ some preliminaries are discussed. 
We review the notion of LR automaton in Section || 
and introduce the notion of 2LR automaton in Sec- 
tion |]. Then we specify our tabular LR method in 
Section pl and provide an analysis of the algorithm in 
Section H. Finally, some empirical results are given 



in Section 0, and further discussion of our method is 
provided in Section |ş[ 

2 Definitions 

Throughout this paper we use standard formal lan- 
guage notation. We assume that the reader is famil- 



iar with co ntext-free grammar parsing theory ( Har- 
rison, 1978|). 



A context-free grammar (CFG) is a 4-tuplc G = 
(S, N, P, S), where E and N are two finite disjoint 
sets of terminal and nonterminal symbols, respec- 
tivcly, S G İV is the start symbol, and P is a finite 
set of rules. Each rule has the form A — > a with 
A e N and a e V*, where V denotes N U E. The 
size of G, written \ G\, is defined as J2^^a)eP I' 
by \a\ we mean the length of a string of symbols a. 

We generally use symbols A,B,C, . . . to range 
över İV, symbols a, b, c, . . . to range över E, symbols 
X, Y, Z to range över V, symbols a, (3, 7, . . . to range 
över V* and symbols v, w, x, . . . to range över E*. 
We write e to denote the empty string. 

A CFG is said to be in binary form if a E 
{e} U V U N 2 for ali of its rules A -> a. (The bi- 
nary form does not limit the (weak) generative ca- 
pacity of context-free grammars (Harrison, 1978).) 
For technical reasons, we sometimes use the aug- 
mented grammar associated with G, defined as G^ = 
(E^,N^,P^,S^), where Sft, > and <a are fresh sym- 
bols, £t = S U {>, <}, ATt = 7V U {S*} and 

A pushdown automaton (PDA) is a 5-tuplc ^4 = 
(E, Q, T, qi n , qfin), where E, Q and T are finite sets 
of input symbols, stack symbols and transitions, re- 
spectively; qi n e Q is the initial stack symbol and 
Çfin G Q is the final stack symbolj^] Each transition 
has the form Sı 1— > 82, where 81,62 S Q*, 1 < \8\\, 
1 < 1 82 | < 2, and z = e or z = a. We generally use 
symbols q,r, s, . . . to range över Q, and the symbol 
<5 to range över Q*. 

Consider a fbced input string v e E*. A config- 
uration of the automaton is a pair (8, w) consisting 
of a stack 8 € Q* and the remaining input w, which 
is a suffix of the input string v. The rightmost sym- 
bol of 8 represents the top of the stack. The initial 
configuration has the form (qi n ,v), where the stack 
is formed by the initial stack symbol. The final con- 
figuration has the form (qi n qfi n , e), where the stack 
is formed by the final stack symbol stacked upon the 
initial stack symbol. 



1 We dispense with the notion of state, traditionally 
incorporated in the definition of PDA. This does not 
affect the power of these devices, since states can be 
encoded within stack symbols and transitions. 



The application of a transition 8\ \— * 82 is de- 
scribed as follows. İf the top-most symbols of the 
stack are S\, then these symbols may be replaced 
by ^2, provided that either z = e, or z — a and a 
is the first symbol of the remaining input. Further- 
more, if z — a then a is removed from the remaining 
input. Formally, for a fixed PDA A we define the bi- 
nary relation h on configurations as the least relation 
satisfying (88ı, w) ^ (882, w) if there is a transition 
81 ıA 82, and (SSı,aw) h (882, w) if there is a tran- 
sition Sı ^> 82- The recognition of a certain input v 
is obtained if starting from the initial configuration 
for that input we can reach the final configuration 
by repeated application of transitions, or, formally, 
if (<3m,v) ^* (<7m Çfin, e), where h* denotes the re- 
flexive and transitive closure of K 

By a computation of a PDA we mean a sequence 
(q m ,v) h (8ı, wı) h . . . h (8 n ,w n ), n > 0. A PDA is 
called deterministte if for ali possible configurations 
at most one transition is applicable. A PDA is said 
to be in binary form if, for ali transitions 81 82 , 
we have | <5ı | < 2. 

3 LR automata 

Let G = (E, N, P, S) be a CFG. We recall the no- 
tion of LR automaton, which is a particular kind 
of PDA. We make use of the augmented grammar 
G* = (27t, JVt.i* fit) introduced in Section |. 

Let 7 LR = {A -> a • (3 \ (A —> a/3) e Pt}. 
We introduce the funetion closure from 2 7lr to 2 7lr 
and the funetion goto from 2 /lr x V to 2 /lr . For 
any q Ç / LR , closure(q) is the smallest set such that 

(i) q Ç closure(q); and 

(ii) (B — ¥ a • Afi) e closure(q) and (A — > 7) e P 1 * 
together imply (j4 — > • 7) G closure(q) . 

We then define 

goto(q,X) = 

{A -> aA • /? | (jl->aı A/3) e cZosure(ç)}. 

We construct a finite set 7£lr as the smallest collec- 
tion of sets satisfying the conditions: 

(i) {5t ^ > . s<} e and 

(ii) for every q e T^lr and X G V, we have 
goto(q,X) S 7?-lr, provided goto(q,X) ^ 0. 

Two elements from 7?-lr deserve special attention: 
<7in = {S* 1 — > > • *S'<l}, and g^ n , which is defined to 
be the unique set in 7?-lr containing (S* — * >>S • <); 
in other words, qfi n = goto(qi n , S). 



For A 6 İV, an A-redex is a string go9ı<72 • • • <?m, 
m > 0, of elements from satisfying the follow- 

ing conditions: 

(i) (A — > a •) £ closure(q m ), for some a = 
Xı A2 • • • X m ; and 

(ii) goto(q k -ı,X k ) = qk, for 1 < fc < m. 

Note that in such an A-redex, (A — ► • Aı A 2 • • • X m ) 
G closure(q ), and (A — > Xı • • ■ X fe • A fe+1 • ■ ■ X m ) 
E Qfe, for < k < m. 

The LR automaton associated with G is now in- 
troduced. 

Definition 1 .â L r = (£,Qlr,?lr, 9in ! Qftn)ı where 

QhR = ^LR, Çin = {S t -> O • 5<}, q fin = 
goto(qi n , S), and Tlr contains: 

(i) q ^ q q' , for every a E S and q, q' E 7£lr smc/i 
i/ıcrf q' — goto(q, a); 

(ii) qS A a q', /or every A e İV, A-redeo; g<$, and 
ç' € 7^-lr smc/i i/ıai = goto(q, A). 

Transitions in (i) above are called shift, transitions 
in (ii) are called reduce. 

4 2LR Automata 

The automata .4lr defined in the previous section 
are deterministle only for a subset of the CFGs, 



called the LR(0) grammars (Sippu and Soisalon 



Soinincn, 199C), and behave nondetcrministically 
in the general case. When designing tabular 
methods that simulate nondeterministic computa- 
tions of .Alr, two main difhculties are encountered: 

• A reduce transition in „4lr is an elementary op- 
eration that removes from the stack a number 
of elements bounded by the size of the underly- 
ing gramnıar. Consequently, the time require- 
ment of tabular simulation of „4lr computa- 
tions can be onerous, for reasons pointed out 



by Bheil (1976) and Kipps (1991 



• The set 7?-lr can be exponential in the size of 
the grammar ( Johnson, 1991 ). If in such a case 
the computations of ^Ilr touch upon each state, 
then time and space requirements of tabular 
simulation are obviously onerous. 

The first issue above is solved here by re- 
casting „4lr in binary form. This is done 
by considering each reduce transition as a se- 
quence of "pop" operations which affect at most 

two stack symbols at a time. (See also 

Lang (1974 ), Villcmonte de la Clergerie (1993 ) and 



Nedcrhof (1994a| ) and for LR jarsi ng spccifically 
Kipps (1991D and |Leermakers (1992bQ .) The follow- 
ing definition introduces this new kind of automaton. 

Definition 2 A' hR = (U,Q' hR ,T[ R ,q in ,qfi n ), where 
Q'lr = ^lr U İlr, q m = {S^ -> t> • S<i}, q fin = 
goto(qi n ,S) and T^ R contains: 

(i) gAg q' ) for every a E S and q, q' E 7\Llr such 
that q' = goto(q, a); 

(ii) q A q [A — > a •), for every q E İZlr and {A — > 
a •) E closure(q); 



(iii) q(A->aX 

q E 7?.lr and (A 



■ (A a • X(3), for every 
aX • (3) E q; 



(iv) q (A — > • a) A q q' , for every q,q' E 7£lr and 
(A — > a) E P^ such that q' = goto(q, A). 

Transitions in (i) above are again called shift, tran- 
sitions in (ii) are called initiate, those in (iii) are 
called gathering, and transitions in (iv) are called 
goto. The role of a reduce step in ,4lr is taken över 
in A^ R by an initiate step, a number of gathering 
steps, and a goto step. Observe that these steps in- 
volve the new stack symbols (A — ► a • (3) € İlr 
that are distinguishable from possible stack symbols 
{A -> a • 0} E K LR . 

We now turn to the second above-mentioned prob- 
lem, regarding the size of set 7^-lr- The problem 
is in part solved here as follows. The number of 
states in 7?-lr is considerably reduced by identify- 
ing two states if they become identical after items 
A — ► a • [3 from İlr have been simplified to only 
the suffix of the right-hand side (3. This is rem- 
iniscent of techniques of state minimization for fi- 
nite automata (Booth, 1967), as they have been ap- 
plied before to LR parsing, e.g., by Pager (1970) and 
Nederhof and Sarbo (1993| ). 

Lct be the augmented grammar associated 
with a CFG G, and let I 2LR = {{3 \ (A — > af3) E 
P^}. We define variants of the elosure and goto func- 
tions from the previous section as follows. For any 
set q C İ 2 lr, elosure (q) is the smallest colleetion of 
sets such that 

(i) q Ç elosure (q); and 

(ii) (Af3) E elosure' (q) and (A — > 7) E P f together 
imply (7) E elosure (q). 



Also, we define 
goto'(q,X) 



{13 | (X(3) E closure'(q)}. 



We now construct a finite set 7?.2LR as the smallest 
set satisfying the conditions: 



(i) E ft 2LR ; and 

(ii) for every q E 7\L2LR and X e V, we have 
goto'(q,X) E 7?-2LR, provided goto'(q,X) ^ 0. 

As stack symbols, we take the elements from İ2LR 
and a subset of elements from (V x 7?.2Lr): 

Q 2 lr - {(X,g) | 3q'[goto'(q',X) = q]} U 7 2LR 

In a stack symbol of the form (X, q) , the X serves 
to record the grammar symbol that has been recog- 
nized last, cf. the symbols that formerly were found 
immediately before the dots. 

The 2LR automaton associated with G can now 
be introduced. 

Definition 3 A 2 lr = (S,Q 2 LR,T 2LR ,q' m ,q' fin ), 
uthere Q 2 lr, * s as defined above, q' in — (t>, {5<l}) 7 
q'^ n = (S, goto'({S<}}, S)), and T 2 lr contains: 

(i) (X,q) A (X,q) (a,q r ), for every a E S and 
(X,q), (a,q') E Q2LR such that q' = goto'(q,a); 

(ii) (X,q) A (X,q) (e), for every (X,q) E Q 2LR 
such that e E closure' (q) ; 

(İÜ) (X,q) [fi) A (A/3), for every (X,q) E Q 2LR 
and j3 £ q; 

(iv) (X,q) (a) A (X,q) (A,q>), for every (X,q), 
(A, q') E Q2LR cmd (A — > a) E P* smc/i iftai 

Note that in the case of a reduce/reduce conflict 
with two grammar rules sharing some suffbc in the 
right-hand side, the gathering steps of .A2LR will 
treat both rules simultaneously, until the parts of 
the right-h and sides are reache d where the two rules 
differ. (See Leermakers (1992a ) for a similar sharing 
of computation for common suffbces.) 

An interesting fact is that the automaton *42lr is 
very similar to the automaton .Alr constructed for 
a gramm ar transformcd by the tran sformation T two 
given by [Nederhof and Satta (1994İ ).R 



5 The algorithm 

This section presents a tabular LR parser, which is 
the main result of this paper. The parser is derived 
from the 2LR automata introduced in the previous 
sec tion. Following th e general approach presented 
by Leermakers (1989 ), we simulate computations of 



For the earliest mention of this transform ation. we 
have encountered pointers to |Schauerte (1973| ). Regret- 
tably, we have as yet not been able to get hold of a copy 
of this paper. 



these devices using a tabular method, a grammar 
transformation and a filtering function. 

We make use of a tabular parsing algorithm which 
is basically an asynchronous version of the CYK al- 
gorithm, as presented by Harrison (1978), extended 
to productions of the forms A — » B and A — > e 
and with a left-to-right filtering condition. The al- 
gorithm uses a parse table consisting in a 0-indexed 
square array U . The indices represent positions in 
the input string. We define Ui to be ljfc<i Uk,i- 

Computation of the entries of U is moderated by 
a filtering process. This process makes use of a 
function pred from 2 N to 2 N , specific to a certain 
context-free grammar. We have a certain nontermi- 
nal Ai n u which is initially inserted in £/o,o in order 
to start the recognition process. 

We are now ready to give a formal specification of 
the tabular algorithm. 

Algorithm 1 Lct G = (S, N, P, S) be a CFG in 

binary form, let pred be a function from 2 N to 2 N , 
let Ama be the distinguishcd element from N, and 
let v = a\a 2 • ■ ■ a n E S* be an input string. We 
compute the least (n + 1) x (n+ 1) table U such that 
A im t E U 0fi and 

(i) A E Uj-rj 

if (A -> a.j) E P, A e prediUj-!); 

(ü) A E U h3 

if {A -> e) E P, A E pred(Uj); 

(iii) A E Uij 

if B E U irk , C E U hj , (A -> BC) E P, A E 
pred(Ui); 

(iv) A E U hj 

if B E U itj , {A -> B) E P, A E pred(Ui). 

The string has been accepted when S E J7o, ra - 

We now specify a grammar transformation, based 
on the definition of .Â2LR- 

Definition 4 Let _4 2 lr = (£, Q2LR, Î2 L r, g- n , gjj n ) 
6e ifte 2LR automaton associated uıith a CFG G. 
TTıe 2LR cover associated uıith G is the CFG 
C 2 lr(G) = O^! Q2LR, P2LR, Qfi n ), where the rules in 
P 2 lr are given by: 

(i) (a,q') -> a, 

for every (X, q) A (X, q) (a, q') E T 2L r; 

(ü) (e) - e, 

for every (X, q) A (X,q) (e) E T 2L r; 

(iii) (A/3) — > (X, q) (P), 

for every (X, q) {(3) A (X(3) E T 2LR ; 



(iv) (A,q')^(a), 

for every (X,q) (a) A (X,q) (A,q') £ T 2L r. 

Observe that there is a direct, one-to-one correspon- 
dence between transitions of A2LR and productions 
of C 2LR (G). 

The accompanying function pred is defined as fol- 
lows (q, q', q" range över the stack elements): 

pred(r) = {q \ q'q" Açe T^lr} U 

{q I <?' e r, ç' A q' q £ T 2LR } U 
{<Z I 9' e r, q' q" ^ q' q e T 2LR }. 

The above definition implies that only the tabular 
equivalents of the shift, initiatc and goto transitions 



labelled A and the trees in list ts as imnıcdiatc 
subtrees. 

We emphasize that in the third clause above, one 
should not consider more than one q for given k in 
order to prevent spurious ambiguity. (In fact, for 
fixed X,i,k and for different q such that (X, q) £ 
Ui t k, tree((X,q),i,k) yields the exact same set of 
trees.) With this proviso, the degree of ambiguity, 
i. e. the number of parses found by the algorithm for 
any input, is reduced to exactly that of the source 
grammar. 

A practical implcmcntation would construct the 
parse trees on-the-fiy, attaching them to the table 
entries, allowing packing and sharing of subtrees (cf. 



are subject to actnai örtertng; the simulaLioıı of the 



galheriııg transtttüns does 110L depeııd 011 elemoııls 
in t. 

Finally, the distinguished nonterminal from the 
cover used to initialize the table is q' in . Thus we 
start with (>, {S<}) £ U ofi - 

The 2LR cover introduces spurious ambiguity: 
where some grammar G would allow a certain num- 
ber of parses to be found for a certain input, the 
grammar C 2 lr(G) in general allows more parses. 
This problem is in part solved by the filtering func- 
tion pred. The remaining spurious ambiguity is 
avoided by a particular way of constructing the parse 
trees, described in what follows. 

After Algorithm |I] has recognized a given in- 
put, the set of ali parse trees can be computed as 
tree(q'^ n ,0,n) where the function tree, which deter- 
mines sets of either parse trees or lists of parse trees 
for entries in U , is recursively defined by: 

(i) tree ((a, q'), i, j) is the set {a}. This set contains 
a single parse tree consisting of a single node 
labelled a. 

(ii) tree(e, i, i) is the set {e}. This set consists of an 
empty list of trees. 

(iii) tree(X(3,i,j) is the union of the sets i ^, 
where i < k < j, (f3) £ Ukj, and there is at 
least one (X,q) £ U itk and (X0) -> (X,q) (0) 
in C 2 lr(G), for some q. For each such fc, select 
one such q. We define T~nçp) i j = {* " is I * e 
tree((X,q),i,k) A ts e tree(0,k,j)}. Each t ■ ts 
is a list of trees, with head t and tail ts. 

(iv) tree((A,q'),i,j) is the union of the sets 
where (a) £ 



the literatüre on parse forests (Tomita, 1986; Bil 
lot and Lang, 1989] )). Our algorithm actually only 



Uij is such that 



(A,q') -» (a) inC 2LR (G). We define Tfo^ 
{glue(A, ts) \ ts £ tree(a, i,j)}. The function 
glue constructs a tree from a fresh root node 



needs one (packed) subtree for several (X, q) £ U^k 
with fixed X,i,k but different q. The resulting 
parse forests would then be optimally compact, con- 
trary to some other LR-based tabular algorithms, as 
pointed out by |Rekers (1992[ ), [Nederhof (1993| ) and 
Nedcrhof (1994b^ 



6 Analysis of the algorithm 

In this section, we investigate how the steps per- 
formed by Algorithm [î] (applied to the 2LR cover) 
relate to those performed by ^4 2 lr, for the same in- 
put. 

Wc define a subrclation ^ + of h + as: (5, uw) \= + 
(55' , w) if and only if (5, uw) = (5, z\Zı ■ ■ ■ z m w) h 
(55ı, z 2 ■ ■ ■ z m w) h ... h (55 m ,w) = (55', w), for 
some m > 1, where \5k\ > for ali fc, 1 < fc < m. 
Informally, we have (5, uw) \= + (55', w) if configura- 
tion (55' , w) can be reached from (5, uw) without the 
bottom-most part 5 of the intermediate stacks being 
affected by any of the transitions; furthermore, at 
least one element is pushed on top of 5. 

The following characterization relates the automa- 
ton *4 2 lr and Algorithm [î] applied to the 2LR cover. 
Symbol q £ Q 2 lr is eventually added to Uij if and 
only if for some 5: 



(<4 



, aı 



, a„) h* (5, a i+ ı . . . a„) \= + (5q, a J+ ı 



In words, q is found in entry Uij if and only if, at 
input position j, the automaton would push some 
element q on top of some lower-part of the stack 5 
that remains unaffected while the input from i to j 
is being read. 

The above characterization, whose proof is not re- 
ported here, is the justification for calling the result- 
ing algorithm tabular LR parsing. In particular, for 
a grammar for which »4 2 lr is deterministle, i. e. for 
an LR(0) grammar, the number of steps performed 



by A2LR and the number of steps performed by the 
above algorithm are exactly the same. In the case of 
grammars which are not LR(0), the tabular LR algo- 
rithm is more efficient than for example a backtrack 
realisation of A2LR- 

For determining the order of the time complex- 
ity of our algorithm, we look at the most expen- 
sive step, which is the computation of an element 
(XP) G Uij from two elements (X, q) e Ui f. and 
09) £ U kJ , through (X,q) ((3) A (X(3) e T 2LR . In 
a straightforward realisation of the algorithm, this 
step can be applied 0(|T2lr, | • \v\ 3 ) times (once for 
each i,k,j and each transition), each step taking a 
constant amount of time. We conclude that the time 
complexity of our algorithm is ö{\ T2LR I ' I v | 3 )- 

As far as space requirements are concerned, each 
set Ui j or Ui contains at most | Q2LR | elements. 
(One may assume an auxiliary table storing each {/*.) 
This results in a space complexity ö(\ (?2LR | ■ | v | 2 ). 

The entries in the table represent single stack ele- 
ments, as opposed to pairs of stack elements follow- 
ing Lang (1974) and Leermakers (1989|). Th is has 
been investigated before by Ncderhof (1994a, p. 25) 
and [Vilicmontc de la Clergcric (1993| , p. 155) . 

7 Empirical results 

We have performed some experiments with Algo- 
rithm [İ] applied to ^2LR and A'^, f° r 4 practical 
context-free grammars. For *4lr a cover was used 
analogous to the one in Definition [|; the filtering 
funetion remains the same. 

The first grammar generates a subset of the pro- 
gramming langua ge ALGOL 68 ( van Wijngaarden 
and others, 1975 ). The second and third grammars 
generate a fragment of Dutch, and a re referred to as 
the CORR ic grammar (Vosse, 1994 ) and the Deltra 
grammar ( Bchoorl and Belder, 1990 ), respeetively. 
These grammars were stripped of their arguments 
in order to convert them into context-free grammars. 
The fourth grammar, referred to as the Alvey gram- 
mar ( Carroll, 1993 ) , generates a fragment of English 
and was automatically generated from a unification- 
based grammar. 

The test sentences have been obtained by au- 
tomatic generation from the grammars, using the 
Grammar Workbench (Nederhof and Koster, 1992), 
which uses a random generator to select rules; there- 
fore these sentences do not necessarily represent in- 
put typical of the applications for which the gram- 
mars were written. Table ^ summarizes the test ma- 
terial. 

Our implementation is merely a prototype, which 
means that absolute duration of the parsing process 



G=(2,N,P,S) 


|G| 


\N\ 


1^1 


\w\ 


ALGOL 68 


783 


167 


330 


13.7 


CORRie 


1141 


203 


424 


12.3 


Deltra 


1929 


281 


703 


10.8 


Alvey 


5072 


265 


1484 


10.7 



Table 1: The test material: the four grammars and 
some of their dimensions, and the average length of 
the test sentences (20 sentences of various length for 
each grammar). 



G 


A 1 


^2LR 


space 


time 


space 


time 


ALGOL 68 


327 


375 


234 


343 


CORRie 


7548 


28028 


5131 


22414 


Deltra 


11772 


94824 


6526 


70333 


Alvey 


599 


1147 


354 


747 



Table 2: Dynamic rcquirements: average space and 
time per sentence. 

is little indicative of the aetual efficiency of more 
sophisticated implementations. Therefore, our mea- 
surements have been restricted to implementation- 
independent quantities, viz. the number of elements 
stored in the parse table and the number of elemen- 
tary steps performed by the algorithm. In a practical 
implementation, such quantities will strongly infhı- 
ence the space and time complexity, although they 
do not represent the only determining factors. Fur- 
thermore, ali optimizations of the time and space 
efficiency have been left out of consideration. 

Table |^ presents the costs of parsing the test sen- 
tences. The first and third columns give the number 
of entries stored in table U, the second and fourth 
columns give the number of elementary steps that 
were performed. 

An elementary step consists of the derivation of 
one element in Qlr or Q2LR from one or two other 
elements. The elements that are used in the filter- 
ing process are counted individually We give an 
example for the case of .Alr ■ Suppose we derive an 
element q' e Ui.j from an element [A — > • a) € C^.j, 
warranted by two elements 91,(72 S Ui, q\ 7^ q2, 
through pred, in the presence of qı (A — * • a) ıA 
1ı i' e Tl R and q 2 (A -f . a) A q 2 q' e T^ R . We 
then count two parsing steps, one for q\ and one for 

<72- 

Table || shows that there is a significant gain in 
space and time efficiency when moving from A!^ to 
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^LR 


^2LR 


I^LRİ 


IGlrI 


Kn\ 


|^2LRİ 


K?2LR 


|T 2L R.| 


ALGOL 68 


434 


1,217 


13,844 


109 


724 


12,387 


CORRie 


600 


1,741 


22,129 


185 


821 


15,569 


Deltra 


856 


2,785 


54,932 


260 


1,089 


37,510 


Alvey 


3,712 


8,784 


1,862,492 


753 


3,065 


537,852 



Table 3: Static requircmcnts. 



^2LR- 

Apart from the dynamic costs of parsing, we have 
also measured some quantities relevant to the con- 
struction and storage of the two types of tabular LR 
parser. These data are given in Table |. 

We see that the number of states is strongly re- 
duced with regard to traditional LR parsing. In the 
case of the Alvey grammar, moving from |72.lr| to 
|7^2LRİ amounts to a reduction to 20.3 %. Whereas 
time- and space-efficient computation of İZlr for 
this grammar is a serious problem, computation of 
7?-2LR wül not be difhcult on any modern computer. 
Also significant is the reduction from \T^ \ to |T2lr|, 
especially for the larger grammars. These quanti- 
ties correlate with the amount of storage needed for 
naive representation of the respective automata. 

8 Discussion 

Our treatment of tabular LR parsing has two impor- 
tant advantages över the one by Tomita: 

• It is conceptually simpler, because we make use 
of simple concepts such as a grammar trans- 
formation and the well-understood CYK al- 
gorithm, instead of a complicated mechanism 
working on graph-structured stacks. 

• Our algorithm rcquircs fcwcr LR states. This 
leads to faster parser generation, to sınailer 
parsers, and to reduced time and space com- 
plexity of parsing itself. 

The conceptual simplicity of our formulation of 
tabular LR parsing allows comparison with other 
tabular parsing teeh nigues, such as Earley's algo- 
rithm ( Earley, 197C ) and tabular left-corner pars- 
ing ( Ncdcrhof, 1993 ), based on implcmentation- 
independent criteria. This is in contrast to experi- 
ments reported before (e.g. by Shann (1991 )), which 
treated tabular LR parsing differently from the other 
techniques. 

The reduced time and space complexities reported 
in the previous section pertain to the tabular real- 
isation of two parsing techniques, expressed by the 



automata -4 LR anc ^ -^2LR- The tabular realisation 
of the former automata is very elose to a variant of 
Tomita's algorithm by Kipps (1991). The objeetive 
of our experiments was to show that the automata 
-A 2 lr provide a better basis than .Âlr f° r tabular LR 
parsing with regard to space and time complexity. 

Parsing algorithms that are not based on the LR 
technique have however been left out of considera- 
tion, and so were techniques for unification gram- 
mars and techniqucs incorporating finite-state pro- 
cesses.0 

Theoretical considerations ( Leermakers, 19*8^ ; 



[Bchabcs, 1991 ; Ncderhof, 1994b| ) have suggested that 
for natural language parsing, LR-based techniques 
may not necessarily be superior to other parsing 
techniques, although convincing empirical data to 
this effect has never been shown. This issue is dif- 
hcult to resolve because so much of the relative ef- 
hcieney of the different parsing techniques depends 
on particular grammars and particular input, as well 
as on particular implcmcntations of the tcchniqucs. 
We hope the conceptual framework presented in this 
paper may at least partly alleviate this problem. 
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As re marked before by Nederhof (199; ). t he algo- 
rithms by |Schabes (1991 ) and |Leermakers Tl989| ) are not 
really related to LR parsing, although some notation 
used in these papers suggests otherwise. 
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